Almathera Ten Pack 2: CDPD 1

home *** CD-ROM | disk | FTP | other *** search

/ Almathera Ten Pack 2: CDPD 1 / Almathera Ten on Ten - Disc 2: CDPD 1.iso / pd / 351-375 / 352 / treewalk / timings < prev next >

Wrap

Text File | 1995-03-14 | 8KB | 199 lines

All timings run on an A1000 w/ 68010, 4Meg of fast ram & 1/2Meg of chip ram, off of a Supra 60 drive. My standard work environment was in place: interlaced morerow'ed WB screen and 50K stack, with the following active processes: Task Pri Address Command Directory 1 0 251948 jobs src:treewalk 2 0 250878 emacs src:treewalk 3 0 213e40 SupraMount 4 20 24fb38 bin:startprogs/machII RAM DISK: 6 0 2552f0 bin:startprogs/wicon RAM DISK: In addition, snipit, ARexx 1.10, srt, FF, conman, installbeep the wb and mymenu were in place. The current directory was src:treewalk, the directory tree scanned was rooted at src:tmp, consisting of about 20Meg of random stuff. It wasn't changed throughout. Find is a PA find, available on fish disk 197. Treewalk is the binary included in this distribution. files is Lattice's files command, version 1.01, from Lattice C 5.02. Though files & treewalk are residentable, find is not. Therefore, all three commands were run non-resident to even the field. The object was to measure the algorithms used, not the implementation details. First test: walking a large tree. Timings labeled "no output" were made from Rexx, via a script that ran each command 10 times, following each run by the time elapsed during the run, measured in seconds. The output was run through "grep -v src:" to throw away all output but timings and error messages. Timings labeled "output" consisted of one run, with the output going to standard out. find src:tmp -print, output: 115.62, no output: 28.04 28.00 28.14 28.08 28.10 28.02 28.02 28.08 28.00 28.18 average: 28.066 treewalk dir src:tmp, output: 122.48, no output: 46.66 46.54 46.78 46.60 46.48 46.54 46.74 46.72 46.68 46.46 average: 46.620 files src:tmp, output: 240.70, no output 163.58 163.64 163.50 163.30 163.54 163.96 164.10 163.44 163.00 163.36 average: 163.542 note: files complained about multiple directories having to many files or being empty. It doesn't state which. Second test: listing files from a large file system that need to be backed up. This was lifted from my backup script, which normally process the output. To make this more realistic, the output run through "grep -v src:" which matches the actual use during backup (being run into execio for processing by the Rexx backup script). Once again 10 iterations were run. Note: only two files actually met the entire selection criteria, which isn't unusual. find src:tmp -type f -newer src:last-backup ! -name *~ ! -name *.o -print 24.88 25.06 25.06 25.08 25.12 25.10 25.16 25.10 25.22 25.20 average: 25.098 treewalk dir src:tmp filter "file && src:last-backup.date < date && !(filename *= '*~' || filename *= '*.o')" 25.24 25.42 25.32 25.48 25.48 25.52 25.34 25.24 25.42 25.38 average: 25.384 Files is unable to perform this search, as it lacks the ability to test for files not matching a name. Third test: cleaning up a large working directory. A copy of the tree was created, and the copy is deleted in two passes: first, all files matching "*.o" were deleted, and then everything else was deleted. The deletion utility is "rm", which is a version of delete without the limit on the number of arguments. This allows treewalk to not have to invoke the command multiple times. While this may seem unfair to find, part of the purpose of creating treewalk was to overcome this disability in find. To make the test realistic, rm was resident for all runs of the test. Files has an option to cause file deletion which was used so that files would run in reasonable time. The sources to "rm" are available upon request. To avoid having to copy the tree multiple times, this test was run only one time for each command. Since the multiple run tests show little variance, it isn't expected that these will show much variance either. time find tmp:mg -name *.o -exec rm "{}" ";" 53.76 find tmp:mg -exec rm "{}" ";" 150.68 Note: Find doesn't support AmigaDOS wildcarding. Note: Find failed to delete any directories during the second phase of the trial, even though it deleted all regular files. treewalk dir tmp:mg filter "filename#='#?.o'" rm 24.40 treewalk post dir tmp:mg rm 50.66 Note: to insure that directories are seen after files, treewalk needs to be told to do a postorder traversal of the tree during the second phase. Note: treewalk did not delete the top-level directory, but this is to be expected from it's documentation. files -rerase -name #?.o tmp:mg 353.14 files -rerase tmp:mg 159.88 Note: files complains that it can't delete certain directories during the first phase. This is odd and somewhat annoying. As a couple of asides, I ran the filtered treewalk file removal, forcing treewalk to run a single copy of rm for each file to delete (the same behavior that find uses) to gain some measure of how important the ability to stack file names on a command is. I then ran the full delete using the standard AmigaDOS delete command, to see how that compared with the other cases. treewalk dir tmp:mg filter "filename#='#?.o'" single rm 43.38 delete tmp:mg all quiet 32.64 Final note: program sizes. find 13044 ----rwed 19-Apr-89 02:29:43 treewalk 19904 --p-rwed Today 21:37:31 files 24096 --p-rwed 19-Apr-89 02:30:22 Some statistics: Running time as a percentage of the slowest program. Times for multiple run tests are the average. program files find treewalk <aside> test 1 output 100 48 51 1 no output 100 17 29 2 not possible 99 100 3 filtered 100 15 7 12 3 unfiltered 100 94 32 20 total 1 & 3 100 38 26 Conclusions: It should be clear that files is the least worthwhile tool of the lot. It's far slower than either of the other two, not as flexible, and much larger. It's inability to distinguish between an empty directory and to many files in a directory is a serious handicap for unattended use on large devices. That source is available to the other two tools, but not to files, doesn't help. Finally, it's insistence on blaming Lattice for it's existence every time it starts just adds insult to injury. Find appears appears to do the actual directory scanning faster than treewalk, but does most everything else slower. Possibly moving to a newer compiler technology would change this lack of speed. However, it's inability to execute a command with multiple file arguments seems to be a major performance hit, and that appears to be inherent in it's user interface, and not solvable without a major redesign (i.e. - treewalk). It is less flexible than treewalk, not having the ability to do things like select all files that were last modified on a particular date. However, it is smaller, which could be a benefit in disk-tight situations. Treewalk does ok for speed, but not wonderfully. In particular, if there is no filtering and the output is going to memory instead of the console, it runs slightly faster than 1/2 the speed of find. This is probably incurred by 1) not using the stack to store the visitation history, so as to avoid not using a vital resource, and 2) using a general treewalking routine instead of one that's inseparable from the program. However, it's ability to select which files to process is better than either alternative. In particular, rather than choosing a small set of primitives about files and hardwiring them into the program, it allows the user to access the data in the files FileInfoBlock, and manipulate it via C-like expressions. The addition of the ability to use ARexx macros as primitives is of unknown utility, but does allow treewalk to mimic the multiple-exec and the '-ok' features of find. The bottom line is that there is no technical reason to use files. Find may be preferable in some cases, but treewalk is probably to be preferred in the general case. Copyright 1989, Mike W. Meyer These files may be used and redistributed under the terms found in the file LICENSE.